34 research outputs found

    Targeted KRAS Mutation Assessment on Patient Tumor Histologic Material in Real Time Diagnostics

    Get PDF
    BACKGROUND: Testing for tumor specific mutations on routine formalin-fixed paraffin-embedded (FFPE) tissues may predict response to treatment in Medical Oncology and has already entered diagnostics, with KRAS mutation assessment as a paradigm. The highly sensitive real time PCR (Q-PCR) methods developed for this purpose are usually standardized under optimal template conditions. In routine diagnostics, however, suboptimal templates pose the challenge. Herein, we addressed the applicability of sequencing and two Q-PCR methods on prospectively assessed diagnostic cases for KRAS mutations. METHODOLOGY/PRINCIPAL FINDINGS: Tumor FFPE-DNA from 135 diagnostic and 75 low-quality control samples was obtained upon macrodissection, tested for fragmentation and assessed for KRAS mutations with dideoxy-sequencing and with two Q-PCR methods (Taqman-minor-groove-binder [TMGB] probes and DxS-KRAS-IVD). Samples with relatively well preserved DNA could be accurately analyzed with sequencing, while Q-PCR methods yielded informative results even in cases with very fragmented DNA (p<0.0001) with 100% sensitivity and specificity vs each other. However, Q-PCR efficiency (Ct values) also depended on DNA-fragmentation (p<0.0001). Q-PCR methods were sensitive to detect<or=1% mutant cells, provided that samples yielded cycle thresholds (Ct)<29, but this condition was met in only 38.5% of diagnostic samples. In comparison, FFPE samples (>99%) could accurately be analyzed at a sensitivity level of 10% (external validation of TMGB results). DNA quality and tumor cell content were the main reasons for discrepant sequencing/Q-PCR results (1.5%). CONCLUSIONS/SIGNIFICANCE: Diagnostic targeted mutation assessment on FFPE-DNA is very efficient with Q-PCR methods in comparison to dideoxy-sequencing. However, DNA fragmentation/amplification capacity and tumor DNA content must be considered for the interpretation of Q-PCR results in order to provide accurate information for clinical decision making

    Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches

    Get PDF
    Drug Safety (DS) is a domain with significant public health and social impact. Knowledge Engineering (KE) is the Computer Science discipline elaborating on methods and tools for developing “knowledge-intensive” systems, depending on a conceptual “knowledge” schema and some kind of “reasoning” process. The present systematic and mapping review aims to investigate KE-based approaches employed for DS and highlight the introduced added value as well as trends and possible gaps in the domain. Journal articles published between 2006 and 2017 were retrieved from PubMed/MEDLINE and Web of Science® (873 in total) and filtered based on a comprehensive set of inclusion/exclusion criteria. The 80 finally selected articles were reviewed on full-text, while the mapping process relied on a set of concrete criteria (concerning specific KE and DS core activities, special DS topics, employed data sources, reference ontologies/terminologies, and computational methods, etc.). The analysis results are publicly available as online interactive analytics graphs. The review clearly depicted increased use of KE approaches for DS. The collected data illustrate the use of KE for various DS aspects, such as Adverse Drug Event (ADE) information collection, detection, and assessment. Moreover, the quantified analysis of using KE for the respective DS core activities highlighted room for intensifying research on KE for ADE monitoring, prevention and reporting. Finally, the assessed use of the various data sources for DS special topics demonstrated extensive use of dominant data sources for DS surveillance, i.e., Spontaneous Reporting Systems, but also increasing interest in the use of emerging data sources, e.g., observational healthcare databases, biochemical/genetic databases, and social media. Various exemplar applications were identified with promising results, e.g., improvement in Adverse Drug Reaction (ADR) prediction, detection of drug interactions, and novel ADE profiles related with specific mechanisms of action, etc. Nevertheless, since the reviewed studies mostly concerned proof-of-concept implementations, more intense research is required to increase the maturity level that is necessary for KE approaches to reach routine DS practice. In conclusion, we argue that efficiently addressing DS data analytics and management challenges requires the introduction of high-throughput KE-based methods for effective knowledge discovery and management, resulting ultimately, in the establishment of a continuous learning DS system

    Integrated methods for analysis and processing of biological data applied on gene expression

    No full text
    The main objective of this thesis is the development of integrated methods for managing and processing biological data with specific emphasis on their application to computational prediction of gene structures in eukaryotes. Most biological analysis tools developed so far are characterized by multi-level heterogeneities that make combinatorial usage and further analysis difficult and problematic. Scientific workflow management systems offer integrated environments for tools orchestration in successive steps, though the selection of the best-fitting tool in each step remains an important issue, considering that the outcome of the underlying computational models are very frequently differentiated. The integration architecture proposed in this thesis offers transparent access to publicly available tools that fulfil common functions, enabling comparative post-analysis of their outcomes. Specifically, the proposed architecture consists of: a) Appropriate wrapping/parsing modules, b) a common schema for describing the results of the predictive modeling, c) combinatorial visualization modules, and d) query formation and execution modules that apply on multiple commonly-described outcomes. The applicability of the architecture was evaluated on a set of ab initio gene predictors. The modular design of the architecture allows for additional functionalities to be implemented, as well as for the incorporation of supplementary schemas describing relevant tools. In this context, the architecture was extended by embodying schema descriptions of publicly available tools that predict specific signal sensors. Signal sensors define the boundaries of functional tracts within a genomic region and their computational prediction, coupled with the outcome of gene structure predictors can be used to increase the efficiency of the underlying learning methods. Splice sites are important signal sensors that define the synthesis of the protein product. Splice sites are located at the beginning and end of an intron and signal the coding regions that are going to be translated into proteins. The biological mechanism that recognizes splice sites involves multiple, complex interactions among adjacent and non-adjacent nucleotides. Our deficient knowledge of these interactions put obstacles in predictive modeling of splice sites. This thesis presents a hybrid method for predicting splice sites that consists of two successive classification steps. The first step is undertaken by a Gaussian support vector machine that is trained on probabilistic data descriptions, using different feature selection methods. The second step combines the evidence of specific features resulted from relevant published studies with the probability estimates of the first classification step, in order to induce a binary decision tree. Finally, the thesis proposes different analysis types of alternatively spliced exons and their neighboring intronic regions, in order to investigate potential discriminative features that are differentiated between constitutive and alternative gene expressions. The results of the analysis give important evidence that is biologically useful, while from the computational point of view they could be used to feed the proposed hybrid identification method, in order to predict alternative splice sites.Το αντικείμενο της διατριβής είναι η ανάπτυξη τεχνικών ενοποίησης διαδικτυακών εργαλείων και επεξεργασίας βιολογικών δεδομένων με σκοπό την αντιμετώπιση των προβλημάτων υπολογιστικής πρόβλεψης γονιδιακών δομών στους ευκαρυωτικούς οργανισμούς. Η ετερογένεια των εργαλείων ανάλυσης των βιολογικών δεδομένων αποτελεί ένα από τα σημαντικότερα εμπόδια στη συνδυαστική χρήση και περαιτέρω αξιοποίηση των αποτελεσμάτων που εξάγουν. Τα συστήματα διαχείρισης ροής εργασιών προσφέρουν ενοποιημένα περιβάλλοντα μέσα από τα οποία επιτρέπεται η διαδοχική εκτέλεση εργαλείων με διαφανή τρόπο, ωστόσο η επιλογή του καταλληλότερου εργαλείου σε κάθε βήμα εκτέλεσης παραμένει ένα σημαντικό πρόβλημα, δεδομένου ότι πολύ συχνά τα αποτελέσματα των υπολογιστικών μοντέλων που υλοποιούν τα εργαλεία του ίδιου βήματος διαφοροποιούνται. Η αρχιτεκτονική ενοποίησης που προτείνεται στη διατριβή έχει ως αντικείμενο τη διαφανή χρήση διαδικτυακών εργαλείων που επιτελούν κοινές λειτουργίες και στόχο να επιτραπεί η περαιτέρω διαχείριση των αποτελεσμάτων που εξάγουν. Η αρχιτεκτονική αυτή εφαρμόστηκε σε ένα σύνολο εργαλείων πρόβλεψης γονιδιακών δομών και προσφέρει: α) τη δυνατότητα υποβολής ερωτημάτων στα εργαλεία αυτά με διαφανή τρόπο, β) την ενοποιημένη περιγραφή των αποτελεσμάτων βάσει ενός κοινού σχήματος, γ) επιλογές συνδυαστικής απεικόνισης των λειτουργικών περιοχών που εντοπίστηκαν, και δ) τη δυνατότητα επεξεργασίας των αποτελεσμάτων μέσα από ένα μηχανισμό υποβολής συνδυαστικών ερωτημάτων. Η αρθρωτή σχεδίαση της αρχιτεκτονικής επιτρέπει την ενσωμάτωση επιπρόσθετων λειτουργιών αλλά και σχημάτων περιγραφής συναφών εργαλείων. Έτσι, ως επέκταση της αρχιτεκτονικής αυτής ενσωματώθηκαν σχήματα περιγραφής διαδικτυακών εργαλείων που προβλέπουν συγκεκριμένα σηματοδοτικά στοιχεία. Τα στοιχεία αυτά οριοθετούν τις λειτουργικές περιοχές των γονιδίων και μπορούν να χρησιμοποιηθούν σε συνδυασμό με τα εργαλεία πρόβλεψης γονιδιακών δομών για τη βελτίωση της ακρίβειας των υπολογιστικών μοντέλων στα οποία έχουν εκπαιδευτεί. Τα σηματοδοτικά στοιχεία που κρίνουν σε μεγάλο βαθμό τη σύνθεση του πρωτεϊνικού προϊόντος είναι οι θέσεις ματίσματος. Οι θέσεις ματίσματος σηματοδοτούν την αρχή και το τέλος των ιντρονικών περιοχών ενός γονιδίου και κατά συνέπεια καθορίζουν τις λειτουργικές περιοχές που στη συνέχεια θα μεταφραστούν στην αντίστοιχη αλυσίδα αμινοξέων. Ο βιολογικός μηχανισμός αναγνώρισης των θέσεων ματίσματος στους ευκαρυωτικούς οργανισμούς περιλαμβάνει πολλαπλές αλληλεπιδράσεις μεταξύ γειτονικών και μη γειτονικών νουκλεοτιδίων. Η ελλιπής κατανόηση των αλληλεπιδράσεων αυτών καθιστά δύσκολη την υλοποίηση αποτελεσματικών υπολογιστικών τεχνικών πρόβλεψης θέσεων ματίσματος. Η διατριβή προτείνει έναν υβριδικό τρόπο αναγνώρισης θέσεων ματίσματος που περιλαμβάνει δύο διαδοχικά βήματα. Στο πρώτο βήμα, χρησιμοποιείται μία γκαουσιανή μηχανή διανυσμάτων υποστήριξης, η οποία εκπαιδεύεται ακολουθώντας δύο διαφορετικούς τρόπους επιλογής των χαρακτηριστικών. Στο δεύτερο, συνδυάζονται τα αποτελέσματα της ταξινόμησης του πρώτου βήματος με τις ενδείξεις που προκύπτουν από σχετικές δημοσιευμένες μελέτες και κατασκευάζεται ένα δυαδικό δένδρο απόφασης που καταλήγει στην τελική εκτίμηση της ισχύος μιας υποψήφιας θέσης ματίσματος. Στο τελευταίο μέρος της, η διατριβή προτείνει διάφορους τρόπους ανάλυσης των εναλλακτικά συναρμολογούμενων εξονίων και των παρακείμενων ιντρονικών περιοχών, προκειμένου να διερευνηθούν τα διακριτά χαρακτηριστικά που διαφοροποιούν τις ιδιοσύστατες από τις εναλλακτικές μορφές γονιδιακής έκφρασης. Τα αποτελέσματα των αναλύσεων αποτελούν σημαντικές ενδείξεις που βιολογικά είναι πολύ χρήσιμες, γιατί δεν έχουν μελετηθεί πειραματικά, ενώ από την υπολογιστική σκοπιά θα μπορούσαν να χρησιμοποιηθούν από το υβριδικό μοντέλο που προτείνει η διατριβή για την πρόβλεψη των θέσεων εναλλακτικού ματίσματος

    Managing Evidence from Multiple Gene Finding Resources via an XML-based Integration Architecture

    No full text
    While biological processes underlying gene expression are still under experimental research, computational gene prediction techniques have reached high level of sophistication with the employment of efficient intrinsic and extrinsic methods that identify protein-coding regions within query genomic sequences. Their ability though to delineate the exact exon boundaries is characterized by a trade off between sensitivity and specificity and still is prone to alternations in gene regulation during transcription and splicing and to inherent complexities introduced by the implemented methodology. Evaluation studies have shown that combinatorial approaches exhibit improved accuracy levels through the integration of evidence data from multiple resources that are further assessed in order to end up with the most probable gene assembly

    In silico structural analysis of sequences containing 5-hydroxymethylcytosine reveals its potential as binding regulator for development, ageing and cancer-related transcription factors

    No full text
    The presence of 5-hydroxymethyl cytosine in DNA has been previously associated with ageing. Using in silico analysis of normal liver samples we presently observed that in 5-hydroxymethyl cytosine sequences, DNA methylation is dependent on the co-presence of G-quadruplexes and palindromes. This association exhibits discrete patterns depending on G-quadruplex and palindrome densities. DNase-Seq data show that 5-hydroxymethyl cytosine sequences are common among liver nucleosomes (p < 2.2x10−16) and threefold more frequent than nucleosome sequences. Nucleosomes lacking palindromes and potential G-quadruplexes are rare in vivo (1%) and nucleosome occupancy potential decreases with increasing G-quadruplexes. Palindrome distribution is similar to that previously reported in nucleosomes. In low and mixed complexity sequences 5-hydroxymethyl cytosine is frequently located next to three elements: G-quadruplexes or imperfect G-quadruplexes with CpGs, or unstable hairpin loops (TCCCAY6TGGGA) mostly located in antisense strands or finally A-/T-rich segments near these motifs. The high frequencies and selective distribution of pentamer sequences (including TCCCA, TGGGA) probably indicate the positive contribution of 5-hydroxymethyl cytosine to stabilize the formation of structures unstable in the absence of this cytosine modification. Common motifs identified in all total 5-hydroxymethyl cytosine-containing sequences exhibit high homology to recognition sites of several transcription factor families: homeobox, factors involved in growth, mortality/ageing, cancer, neuronal function, vision, and reproduction. We conclude that cytosine hydroxymethylation could play a role in the recognition of sequences with G-quadruplexes/palindromes by forming epigenetically regulated DNA ‘springs’ and governing expansions or compressions recognized by different transcription factors or stabilizing nucleosomes. The balance of these epigenetic elements is lost in hepatocellular carcinoma

    Age-dependent methylation in epigenetic clock CpGs is associated with G-quadruplex, co-transcriptionally formed RNA structures and tentative splice sites

    No full text
    Horvath’s epigenetic clock consists of 353 CpGs whose methylation levels can accurately predict the age of individuals. Using bioinformatics analysis, we investigated the conformation, energy characteristics and presence of tentative splice sites of the sequences surrounding the epigenetic clock CpGs, in relation to the median methylation changes in different ages, the presence of CpG islands and their position in genes. Common characteristics in the 100 nt sequences surrounding the epigenetic clock CpGs are G-quadruplexes and/or tentative splice site motifs. Median methylation increases significantly in sequences which adopt less stable structures during transcription. Methylation is higher when CpGs overlap with G-quadruplexes than when they precede them. Median methylation in epigenetic clock CpGs is higher in sequences expressed as single products rather than in multiple products and those containing single donors and multiple acceptors. Age-related methylation variation is significant in sequences without G-quadruplexes, particularly those producing low stability nascent RNA and those with splice sites. CpGs in sequences close to transcription start sites and those which are possibly never expressed (hypothetical proteins) undergo similar extent of age-related median methylation decrease and increase. Preservation of methylation is observed in CpG islands without G-quadruplexes, contrary to CpGs far from CpG islands (open sea). Sequences containing G-quadruplexes and RNA pseudoknots, determining the recognition by H3K27 histone methyltransferase, are hypomethylated. The presented structural DNA and co-transcriptional RNA analysis of epigenetic clock sequences, foreshadows the association of age-related methylation changes with the principle biological processes of DNA and histone methylation, splicing and chromatin silencing

    Unraveling Drug Response from Pharmacogenomic Data to Advance Systems Pharmacology Decisions in Tumor Therapeutics

    No full text
    The availability of systematic drug response registries for hundreds cell lines, coupled with the comprehensive profiling of their genomes/transcriptomes enabled the development of computational methods that investigate the molecular basis of drug responsiveness. Herein, we propose an automated, multi-omics systems pharmacology method that identifies genomic markers of anti-cancer drug response. Given a cancer type and a therapeutic compound, the method builds two cell line groups on the antipodes of the drug response spectrum, based on the outer quartiles of the maximum micromolar screening concentration. The method intersects cell lines that share common features in their mutation status, gene expression levels or copy number variants, and a pool of drug response biomarkers (core genes) is built, using genes with mutually exclusive alterations in the two cell line groups. The relevance with the drug target pathways is then quantified, using the combined interaction score of the core genes and an accessory protein network having strong, physical/functional interactions. We demonstrate the applicability and effectiveness of our methodology in three use cases that end up in known drug-gene interactions. The method steps into explainable bioinformatics approaches for novel anticancer drug-gene interactions, offering high accuracy and increased interpretability of the analysis results. Availability: https://github.com/PGxAUTH/PGxGDSC

    Table_3_Computational Advances in Drug Safety: Systematic and Mapping Review of Knowledge Engineering Based Approaches.DOCX

    No full text
    <p>Drug Safety (DS) is a domain with significant public health and social impact. Knowledge Engineering (KE) is the Computer Science discipline elaborating on methods and tools for developing “knowledge-intensive” systems, depending on a conceptual “knowledge” schema and some kind of “reasoning” process. The present systematic and mapping review aims to investigate KE-based approaches employed for DS and highlight the introduced added value as well as trends and possible gaps in the domain. Journal articles published between 2006 and 2017 were retrieved from PubMed/MEDLINE and Web of Science® (873 in total) and filtered based on a comprehensive set of inclusion/exclusion criteria. The 80 finally selected articles were reviewed on full-text, while the mapping process relied on a set of concrete criteria (concerning specific KE and DS core activities, special DS topics, employed data sources, reference ontologies/terminologies, and computational methods, etc.). The analysis results are publicly available as online interactive analytics graphs. The review clearly depicted increased use of KE approaches for DS. The collected data illustrate the use of KE for various DS aspects, such as Adverse Drug Event (ADE) information collection, detection, and assessment. Moreover, the quantified analysis of using KE for the respective DS core activities highlighted room for intensifying research on KE for ADE monitoring, prevention and reporting. Finally, the assessed use of the various data sources for DS special topics demonstrated extensive use of dominant data sources for DS surveillance, i.e., Spontaneous Reporting Systems, but also increasing interest in the use of emerging data sources, e.g., observational healthcare databases, biochemical/genetic databases, and social media. Various exemplar applications were identified with promising results, e.g., improvement in Adverse Drug Reaction (ADR) prediction, detection of drug interactions, and novel ADE profiles related with specific mechanisms of action, etc. Nevertheless, since the reviewed studies mostly concerned proof-of-concept implementations, more intense research is required to increase the maturity level that is necessary for KE approaches to reach routine DS practice. In conclusion, we argue that efficiently addressing DS data analytics and management challenges requires the introduction of high-throughput KE-based methods for effective knowledge discovery and management, resulting ultimately, in the establishment of a continuous learning DS system.</p

    Dysregulation of Plasma miR-146a and miR-155 Expression Profile in Mycosis Fungoides Is Associated with rs2910164 and rs767649 Polymorphisms

    No full text
    Diagnosis of Mycosis Fungoides (MF) may be challenging, due to its polymorphic nature. The use of miRNAs as biomarkers to assist in diagnosis has been investigated, mainly in skin lesion biopsies. The purpose of this study is to evaluate the plasma levels of miR-146a and miR-155 in MF patients and to investigate their association with SNPs of their genes. Plasma miRNAs were quantified by RT-qPCR. Genomic DNA was used for SNPs&rsquo; genotyping by Sanger sequencing. Plasma levels of miR-146a and miR-155 were significantly higher in patients vs. controls, in early MF patients vs. controls, and in advanced vs. early MF patients. Both miRNAs&rsquo; levels were significantly higher in stage IIB vs. early-stage patients. miR-155 plasma levels were significantly higher in patients with skin tumors or erythroderma. CC genotype (rs2910164 C&gt;G) was significantly more frequent in healthy controls and associated with lower MF risk and lower miR-146a levels. The AA genotype (rs767649 T&gt;A) was significantly more frequent in patients and correlated with increased MF risk and increased miR-155 levels. The combination of GG+AA was only detected in patients and was correlated with higher MF susceptibility. Increased mir-146a and mir-155 plasma levels in MF is an important finding to establish putative noninvasive biomarkers. The presence of SNPs is closely associated with miRs&rsquo; expression, and possibly with disease susceptibility
    corecore